Scalable Distributed Subgraph Enumeration

نویسندگان

  • Longbin Lai
  • Lu Qin
  • Xuemin Lin
  • Ying Zhang
  • Lijun Chang
چکیده

Subgraph enumeration aims to find all the subgraphs of a large data graph that are isomorphic to a given pattern graph. As the subgraph isomorphism operation is computationally intensive, researchers have recently focused on solving this problem in distributed environments, such as MapReduce and Pregel. Among them, the state-of-the-art algorithm, TwinTwigJoin, is proven to be instance optimal based on a left-deep join framework. However, it is still not scalable to large graphs because of the constraints in the left-deep join framework and that each decomposed component (join unit) must be a star. In this paper, we propose SEED a scalable subgraph enumeration approach in the distributed environment. Compared to TwinTwigJoin, SEED returns optimal solution in a generalized join framework without the constraints in TwinTwigJoin. We use both star and clique as the join units, and design an effective distributed graph storage mechanism to support such an extension. We develop a comprehensive cost model, that estimates the number of matches of any given pattern graph by considering powerlaw degree distribution in the data graph. We then generalize the left-deep join framework and develop a dynamic-programming algorithm to compute an optimal bushy join plan. We also consider overlaps among the join units. Finally, we propose clique compression to further improve the algorithm by reducing the number of the intermediate results. Extensive performance studies are conducted on several real graphs, one containing billions of edges. The results demonstrate that our algorithm outperforms all other state-of-theart algorithms by more than one order of magnitude.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scalable Subgraph Enumeration in MapReduce

Subgraph enumeration, which aims to find all the subgraphs of a large data graph that are isomorphic to a given pattern graph, is a fundamental graph problem with a wide range of applications. However, existing sequential algorithms for subgraph enumeration fall short in handling large graphs due to the involvement of computationally intensive subgraph isomorphism operations. Thus, some recent ...

متن کامل

Distributed Graph Layout for Scalable Small-world Network Analysis

The in-memory graph layout or organization has a considerable impact on the time and energy efficiency of distributed memory graph computations. It affects memory locality, inter-task load balance, communication time, and overall memory utilization. Graph layout could refer to partitioning or replication of vertex and edge arrays, selective replication of data structures that hold meta-data, an...

متن کامل

Improved Ise Identification under Hardware Constraint

The three Instruction Set Extension (ISE) enumeration algorithms described in this paper are Subgraph Enumeration (SE), Subgraph Removal (SR), and Lucky Subgraph Removal (LSR). SE exhaustively enumerates all convex subgraphs of a dataflow graph. SR iteratively finds the highest gain subgraph and then locks the related nodes out of the solution space for the next iteration of the search. Finally...

متن کامل

Scalable distributed service migration via Complex Networks Analysis

With social networking sites providing increasingly richer context, user-centric service creation is expected to follow a similar growth with User-Generated Content. The what-is-often-called User Generated Services paradigm calls for efficient yet scalable solutions for optimally placing service facilities. Typically seen as an instance of the facility location problem, service placement has be...

متن کامل

Efficient Enumeration of Induced Subtrees in a K-Degenerate Graph

In this paper, we address the problem of enumerating all induced subtrees in an input k-degenerate graph, where an induced subtree is an acyclic and connected induced subgraph. A graph G = (V,E) is a k-degenerate graph if for any its induced subgraph has a vertex whose degree is less than or equal to k, and many real-world graphs have small degeneracies, or very close to small degeneracies. Alt...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • PVLDB

دوره 10  شماره 

صفحات  -

تاریخ انتشار 2016